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Abstract 

We describe the application of model inference based on reference priors to 

two concrete examples in high energy physics: the determination of the CKM 

matrix parameters p and f) and the determination of the parameters niQ and 

mi/2 iri ^ simplified version of the CMSSM SUSY model. We show how a 

1 -dimensional reference posterior can be mapped to the n-dimensional (n-D) 

,-H parameter space of the given class of models, under a minimal set of conditions 

'T^ on the n-D function. This reference-based function can be used as a prior for 

~^ the next iteration of inference, using Bayes' theorem recursively. 

}—^ 1 Introduction 

■^ It is typical in high energy physics (HEP) to deal with classes of models, e.g. new physics extensions of 

the Standard Model (SM), differing by the values of a set of (typically continuous) unknown parameters. 

' ^ ' Given a set of experimental measurements, one would like to define the region of the model 

p I parameter space that is in agreement with the data. This is what we refer to as Model Inference. The 

X. following ingredients are needed: 

(D 
^ - a theoretical tool that predicts the expected values of the measured observables, given a point in 

the model parameter space; 
^ - a multi-dimensional likelihood, built from the available measurements; 

f^ - and a statistical procedure that evaluates the level of agreement between the data and the predic- 

r ■ tions. 
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While the first and second steps are not controversial, the third step is often polemical and is 
subject to some degree of arbitrariness. Two main approaches are typically followed: Bayesian, which 
computes the posterior probability of the expected values of the model parameters given the likelihood 
^~~i and a prior probability, and frequentist, which provides probability statements about possible values of 

^ the measurements given the observed data and assumed values of the model parameters. 

S^ Historically, most high energy physicists have preferred frequentist statistics because (they say) 

H it allows one to extract statistical information from data without the need for subjective input. In this 

sense, these physicists are victim of the Utopian idea of an analyst-free analysis, in which the "data speak 
for themselves", independently of the personal opinion and judgement of the physicists who perform the 
analysis. However, we are rudely awakened from this Utopian dream on a daily basis as anybody who 
has had to evaluate a systematic uncertainty can confirm ^ Beyond this simple fact, we also tend to 
underestimate how strongly the subjective beliefs of the analyst enters the earlier stages of an analysis, 
as for instance when we define the form of the likelihood. Physicists quote results as rra it o", where 
m and a summarize the result of, perhaps, a likelihood-ratio-based analysis, which already implies 
assumptions about the form of the likelihood ^. When estimating the systematic uncertainty, we typically 



'About 10% of the hep-ex papers on INSPIRE match the search for the word assume, which is quite far from the analyst- 
free paradigm of our dreams. 

^The use of the likelihood ratio to quote the range [m — u, m + a] for the unknown x as the 68% confidence region implies 
that we can define a one-to-one transformation x — > f{x) such that the likelihood is Gaussian in fix). 



sum the different contiibutions in quadrature, implying that the systematic errors are uncorrelated and, 
more importantly, that they may be treated as if they are statistical. This may be true of systematic 
uncertainties that arise ultimately from other statistics; but many systematic uncertainties are "assigned" 
based on judgement or official policy. 

We push this even further when we perform phenomenological analyses. While connecting the 
parameters of a model to the experimental observables, we often need to know a set of additional quan- 
tities (theoretical nuisance parameters) which are not measurable, but which may be known with some 
uncertainty through a theoretical calculation. This is the case, for instance, for the non-perturbative QCD 
parameters determined using lattice QCD calculations. In order to take into account the uncertainty on 
the predictions correctly, a Bayesian analyst would introduce a prior probability density function (pdf) 
for the theoretical nuisance parameters based on the best judgement of the theorist. While this is con- 
sidered dangerously subjective in by many high energy physicists, the same physicists consider it safe 
to modify the likelihood to take account of the theoretical uncertainty on predictions. This breaks the 
objective-frequentist-physicists paradigm twice: i) the functional form used to account for theoretical 
uncertainty is no less subjective than the prior of a Bayesian analysis and ii) the likelihood loses its deep 
and precise meaning of that function obtained by inserting the observations into the probability density 
function describing possible observations. Nobody ever did (and it is likely that nobody ever will) mea- 
sure the theoretical nuisance parameters — indeed, many such parameters such as the factorization and 
renormalization scales are pure artifacts of our current reliance on perturbation theory in theoretical cal- 
culations. As a matter of fact, a physicist performing data analysis is forced to make assumptions. And 
there is nothing wrong with that as long as the assumptions are clearly stated. The problems come when 
the assumptions are hidden in the procedure and not transparent to the people not directly involved in the 
analysis. 

The contrasting attitudes described above can be summarized in terms of the following two per- 
ceived problems: 

- For some high energy physicists, introducing a prior is unacceptable because it brings subjectivity 
into science. "The origin of the problem lies in the very first Bayesian assumption, namely that 
unknown model parameters are to be understood as mathematical objects distributed according 
to PDFs, which are assumed to be known: the priors. Obviously, the choice of the priors cannot 
be irrelevant; hence, the Bayesian treatment is doomed to lead to results which depend on the 
decisions made, necessarily on an unscientific basis, by the authors of a given analysis, for the 
choice of these extraordinary PDFs." fl^. 

- For some statisticians, a meaningful statistical analysis is not possible in the absence of an analysis 
procedure that allows one to incorporate a priori knowledge in a coherent way. "The frequentist 
approach to hypothesis testing does not permit researchers to place probabilities of being correct 
on the competing hypotheses. This is because of the limitations on mathematical probabilities used 
by frequentists. For the frequentist s, probabilities can only be defined for random variables, and 
hypotheses are not variables (they are not observables)... This limitation for frequentists is a real 
drawback because the applied researcher would really like to be able to place a degree of belief 
on the hypothesis. He or she would like to see how the weight of evidence modifies his/her degree 
of belief (probability) on the hypothesis being true." ||2|. 

The use of reference priors |[3| is emerging as a concrete way to solve the two problems. While a 
detailed discussion of the reference priors is beyond the scope of this paper, we highlight here their most 
appeahng properties. 

The main concern against the use of a Bayesian analysis in HEP is related to a priori ignorance, 
more than a priori knowledge. Whenever a priori knowledge is available (e.g. the measurement of the 
luminosity, which is used to translate an observed signal yield into a cross section measurement), there 
is a general consensus that an evidence-based prior should be used. The real issue is how we should 



parameterize "ignorance". The use of a flat prior, a HEP standard, is not quite the right answer. Reference 
priors can be seen as a model of ignorance in the sense that, on average, they maximize the influence of 
the Ukelihood relative to the prior; hence they are a solution to this problem. More precisely, for a given 
likelihood, the reference prior is the prior function that on average maximizes the asymptotic Kullback- 
Leibler divergence [4] between the prior and the posterior, hence enhancing the role of the likelihood (the 
data) over the prior. This is exactly the kind of behavior that we would like for a model of ignorance. And 
this is what we assume the flat prior does for us, when we use it. Unfortunately, the flatness of the prior 
is not invariant under reparameterization. Unlike the flat prior, reference priors give reparameterization- 
invariant results in the cases typically considered in HEP (e.g. one-to-one transformations for which the 
Jacobian is not singular I^SJ). The use of reference priors in HEP has been recently proposed in Ref. 1^, 
where the application in the case of a counting experiment is discussed. This has been applied to real 
LHC data, in one of the CMS Supersymmetry (SUSY) searches iQ. 

In the following, we apply the procedure described in Ref. (E] to two specific cases: i) the determi- 
nation of the parameters p and fj (at fixed A and A) of a simplified CKM matrix and ii) the determination 
of the parameters in the case of a SUSY model ^. In both cases, as an illustration, we limit the discussion 
to the determination of two parameters. The generalization to n > 2 dimensions is computationally more 
demanding, but conceptually equivalent. In both cases, we start from one experimental measurement, 
for which the likelihood can be analytically modeled without too much arbitrariness. We briefly describe 
the derivation of the reference posterior, following Ref. [6j. We then map the 1-D posterior into a n-D 
(n = 2 in our examples) function of the model parameters, introducing the look-alike (LL) prescription. 
This function, based on a reference prior, can then be used as the prior in a recursive application of 
Bayes' theorem to include other measurements. 

2 The reference posterior for a 1-D analysis 

When looking for a signal, produced by the process under study, we are confronted with a Poisson count 
of a signal on top of a background coming from other physics processes. The likelihood for the signal, 
in the absence of a background, is described by a Poisson function. In the presence of a background the 
likelihood asymptotically converges to a Gaussian density. Under these conditions, the reference prior is 
Jeffrey's prior for a Poisson likelihood, Tr{9) ~ 1/VO. 

This is the case for the exclusive measurement of Vub from B — ;■ TTiu decays. What one measures 
is the branching ratio BR{B — )• iriu), which is the related to the the absolute value of the CKM matrix 

element Vub as: 

, _ BRjB ^ TTJi.) 

where evidence-based priors are available both for the width of the B meson Tb (from other measure- 
ments) and the B ^ tt form factor F{B — )• vr) (from theory). One can determine the reference posterior 
for the BR using tt{BR) ~ I/VkR. 

For SUSY searches, one looks for a signal yield s in a signal- sensitive box, defined by a selection 
using signal-vs-background separating variables. One observes a yield n = s + fi, where /x is the 
background surviving the signal-enhancing selection. The expected background ft is estimated from a 
sideband region where no signal is expected, where the observed yield in the sideband is y and the scaUng 
factor b is such that 6/x is the expected background yield in the sideband. In formulas: 
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- the likelihood is p{n\s, fi) = {s + /x)"e (''+^)/ 

- the prior for // is 7r(/u) = 6(6/i)2^-i/2e-''^/r(y + 1/2) and 

- the prior on s is 7r(s) = 7r(s|/u) oc l/-y/s + /u, 

where r{x) is the gamma function. 



''For simplicity, we take a simplified CMSSM with mo and mi/2, fixing ^o = 0, tan/3 — 10, and positive fj,. 



Once the 1-D reference-posterior is derived as described above, we translate this into an n-D 
function of the model parameters. While rigorous algorithms exist to build the n-D reference prior [3|, 
we follow here a computationally simpler heuristic construction, described below, which we call the 
look-alike prescription. 

3 Look-alike prescription 

Mapping the 1-D reference posterior to the n-D space of the model parameter under consideration could 
be achieved by demanding that the n-D pdf satisfy two requirements: 

- all models predicting the same values for the parameter x (Vub and s in our two examples) associ- 
ated with the posterior density P{x) are equi-probable and 

- the n-D function should be such that it maps back to a 1-D function P'{x) identical to the P{x) 
with which we started. 

Given the mapping 6 -.^ x predicted by the physics model, these requirements are sufficient to map 
P{x) to vr(0). We first write the n-D function as tt{9) = K{x{9)) x P{x{9)). The computation of 
K{x{9)) goes as follows: 



P'{x) = d9P{x{9))K{x{9))5{i-x{9)), 

= P{x)K{x) f d9 5{x - x{9)) = P{x), (2) 



where the last equality follows from the second condition. This implies that 

^(^^ = fd96{x-x{9)y ^^^ 

which is the surface of the region spanned by the look-alike (LL) models, that is, models giving the same 
value X '^. 

The case of Vub is useful because it allows us to explain how this works in practice. All the models 
such that p^ + ff = k predict the same value of \Vub\- This makes them LL models, by our definition. 
The LL domain is a circle centered at with radius k. The n-D function is therefore, 

where P{Vub) is the reference posterior for \Vub\- The function 7r(p, ff) is then used as the prior to fit the 
CKM matrix |j8| including the measurement of the CKM phase 7. This step gives the allowed region for 
p and f] shown in the left plot of Fig. [T]l, which is to be compared to a similar plot obtained using fiat 
priors for p and f) (right plot of Fig. [T]). The results of these two calculations are consistent. However, 
the reference posterior for \Vub\ provides a more solid foundation for determining the prior to associate 
with the CKM parameters. 

Our second case study, which uses a simplified version of the mSUGRA model, is more com- 
plicated since Eq. |4] cannot be solved analytically in the case of a generic search for new physics. 
In this case, the LL domain is given by all the models predicting the same expected signal yield s. 
The expected signal yield as a function of the model parameters can be written as s(mo,nii/2) = 
e(mo,nii/2)o'(nio,nii/2)>C, where only the luminosity £ is a constant, while both the cross section 
a and the efficiency of the applied selection e depends on the features of the model (e.g. the masses of 
the SUSY particles), and hence on the model pai^ameters. The function o"(mo, ?7ii/2) can be computed 



''The challenge of generalizing this approach to a generic n-D problem is the calculation of this surface term. 
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Fig. 1: Result for the 2-D allowed region for the CKM parameters p and rj, obtained using the reference posterior 
for Vub and the LL prescription (left), or using flat priors for p and fj (right). 
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Fig. 2: e(TOo, m-1/2) and a{mQ, TO1/2) functions in the case of a hypothetical SUSY search ||9l. 



from the SUSY Lagrangian, while e(mo,"^i/2) has a non-trivial dependence on the models, through 
several effects connected to the detector response. For instance, a model with large (small) mass differ- 
ences would give a large (small) value of e, since harder (softer) spectra for the visible particles produced 
in the SUSY decay chain will have larger (smaller) chance to survive the kinematic cuts. In general, 
the connection between the features of the model and the detector performance produces non- analytical 
iso-yield contours for the LL domains. This is illustrated in Fig.[2J where e{mo, mi/2) and a{mQ, m^ rt) 
are shown in the case of a hypothetical SUSY search [9|. 

On the other hand, all the iso-yield contours have infinite length, resulting in constant K^ttiq, mi/2) 
if one considers the full domain for niQ and m^ /2, and approximately constant if one uses a large-enough 
domain in practice ^. We can then take K{'mo,mi/2) as a constant and show how the method works. 



In case the measurement points to particular region of the plane, i.e. when there is hint of a signal, one could use the 
Savage prescription and cut the plot where the likelihood drops to negligible values. In absence of a signal hint, the situation is 
complicated by the fact that the likelihood peaks at infinite values of mo and mi/2, where the SUSY particles are so heavy that 
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Fig. 3: Result for the 2-D function mapped from the 1-D reference posterior in the case of 1 pb^^ (left), 100 pb^^ 
(center), and 500 pb~^ (right) integrated luminosity. We assume a measurement perfectly in agreement with the 
expectation from the true model, corresponding to mo — 60, mi/2 = 250. 



It has to be clearly stated that this is an approximation, and that the computation of the surface term of 
Eq. |4]is the main challenge in the applicability of the proposed method in its exact form (see Ref. f9l for 
details). 

For illustration, we take the CMS low mass (LM) point HO] (mo = 60, m,i/2 = 250) as the true 
state of nature and we simulate the case of an experiment giving a result exactly at the expectation, for 
low (1 pb^^), moderate (100 pb~^), and large (500 pb^^) statistics. Figure [s] shows the 2-D function 
obtained by the LL prescription. With increasing sample size, the function shows a peak corresponding 
to the true value and to all its (degenerate) LL models, showing the consistency of the procedure. 



4 Conclusions 

We described the use of the reference prior to 1-D cases (typical of a HEP measurement) and how this 
can be used to define an n-D function of the model, induced by the 1-D reference posterior, which may 
then be used as a prior for further applications (e.g. to fit to model parameters). The connection between 
the 1-D posterior on a measurable quantity s (e.g. a signal yield on top of a background h) and an n-D 
function of a set of interesting parameters (e.g. the parameters of a SUSY model) is established through 
the look-alike prescription, which defines a heuristic procedure on the basis of two minimal conditions: 
i) the models predicting the same expected value for the interesting variable s are equi-probable and ii) 
the n-D function should map back to the 1-D reference posterior for s, from which we started. This 
requires the calculation of a surface term (see Eq.|4]l, which can be performed numerically |[9|. While in 
specific cases this choice of a prior might be in conflict with a subjective assessment that could favor one 
region of the parameter space over another, it should be stressed that this Bayesian approach is likely to 
give the best frequentist performance because of the good frequentist properties of reference priors. 

We provided two simplified 2-D examples to illustrate the method, for which computational com- 
plications are absent or marginal. Work is in progress to extend this procedure to more realistic cases |[9|. 
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